Skip to content

fix(data): add bounds check for out-of-range indices in Batch.get_data()#49

Merged
dallasfoster merged 4 commits intoNVIDIA:mainfrom
Ryan-Reese:fix/get-data-bounds-check
Apr 7, 2026
Merged

fix(data): add bounds check for out-of-range indices in Batch.get_data()#49
dallasfoster merged 4 commits intoNVIDIA:mainfrom
Ryan-Reese:fix/get-data-bounds-check

Conversation

@Ryan-Reese
Copy link
Copy Markdown
Contributor

@Ryan-Reese Ryan-Reese commented Apr 3, 2026

Description

Batch.get_data() adjusts negative indices via idx = num_graphs + idx but does not validate the result. When the adjusted index is still negative (e.g. get_data(-5) on a 3-graph batch), Python's negative indexing silently wraps around on the internal storage tensors. Node-level data (positions, forces) may come from one graph while system-level data (energies, cell) comes from a different graph, returning inconsistent data.

Reproduction:

batch = Batch.from_data_list([argon_data, hydrogen_data, oxygen_data])
d = batch.get_data(-5)  # adjusted idx = -2
# d.atomic_numbers → oxygen (graph 2 atoms)
# d.energies → hydrogen value (graph 1 system data)

Type of Change

  • Bug fix (non-breaking change that fixes an issue)

Changes Made

  • Added bounds check after negative index adjustment in Batch.get_data() to raise IndexError for any index outside [0, num_graphs)
  • Added parametrized regression test test_get_data_out_of_range_raises

Testing

  • Unit tests pass locally (make pytest)
  • Linting passes (make lint)
  • New tests added for new functionality meets coverage expectations?

Checklist

  • I have read and understand the Contributing Guidelines
  • I have updated the CHANGELOG.md
  • I have performed a self-review of my code
  • I have added docstrings to new functions/classes
  • I have updated the documentation (if applicable)

Batch.get_data() adjusts negative indices via `idx = num_graphs + idx`
but does not validate the result. When the adjusted index is still
negative (e.g. get_data(-5) on a 3-graph batch), Python's negative
indexing silently wraps around, returning node-level data from one
graph mixed with system-level data from another — silent data corruption.

Add a bounds check after the adjustment to raise IndexError for any
index outside [0, num_graphs).
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 3, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@Ryan-Reese Ryan-Reese marked this pull request as ready for review April 3, 2026 00:23
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 3, 2026

Greptile Summary

This PR adds a bounds check to Batch.get_data() before the negative-index adjustment, raising a clear IndexError for any index outside [-num_graphs, num_graphs). The fix correctly preserves the original index value in the error message (the check runs before idx = self.num_graphs + idx), and the accompanying parametrized tests cover both negative and positive out-of-range cases.

Important Files Changed

Filename Overview
nvalchemi/data/batch.py Adds pre-adjustment bounds check in get_data(); correctly raises IndexError with the original index value for out-of-range indices
test/data/test_batch.py Adds parametrized regression test covering negative and positive out-of-range indices for a 2-graph batch

Reviews (3): Last reviewed commit: "Merge branch 'main' into fix/get-data-bo..." | Re-trigger Greptile

@laserkelvin
Copy link
Copy Markdown
Collaborator

I posted a fix in #50: if that gets merged in before this PR then we'll see if the workflow runs as intended. Otherwise the tests pass like I said, so I would be fine with merging after you address my comment

@laserkelvin laserkelvin added needs-ci Needs to have CI run bug Something isn't working labels Apr 3, 2026
Validate bounds before adjusting negative indices so the error
message shows the caller's original value instead of the
intermediate adjusted index.
@Ryan-Reese Ryan-Reese requested a review from dallasfoster April 7, 2026 15:06
@dallasfoster dallasfoster added this pull request to the merge queue Apr 7, 2026
Merged via the queue into NVIDIA:main with commit 832d04a Apr 7, 2026
5 checks passed
@Ryan-Reese Ryan-Reese deleted the fix/get-data-bounds-check branch April 8, 2026 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working needs-ci Needs to have CI run

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants